We have been using urn models to motivate the use of probability models. Most data science applications are not related to data obtained from urns. More common are data that come from individuals. The reason probability plays a role here is because the data come from a random sample. The random sample is taken from a population and the urn serves as an analogy for the population.
Let’s revisit the heights dataset. Suppose we consider the males in our course the population.
library(dslabs) data(heights) x <- heights %>% filter(sex == "Male") %>% pull(height)
1. Mathematically speaking,
xis our population. Using the urn analogy, we have an urn with the values ofxin it. What are the average and standard deviation of our population?
library(magrittr)
library(dplyr)
library(dslabs)
data(heights)
x <- heights %>% filter(sex == "Male") %>% pull(height)
c(mean(x),sd(x))
## [1] 69.314755 3.611024
2. Call the population average computed above \(\mu\) and the standard deviation \(\sigma\). Now take a sample of size 50, with replacement, and construct an estimate for \(\mu\) and \(\sigma\).
library(tidyverse)
library(dslabs)
data(heights)
set.seed(1)
x <- heights %>% filter(sex == "Male") %>% pull(height)
mu<-mean(x)
sigma<-sd(x)
N<-50
x_N<-sample(x,size=N,replace=TRUE)
mu_bar<-mean(x_N)
sigma_bar<-sd(x_N)
c(mu,sigma)
## [1] 69.314755 3.611024
c(mu_bar,sigma_bar)
## [1] 70.472932 3.426742
3. What does the theory tell us about the sample average \(\bar{X}\) and how it is related to \(\mu\)?
- It is practically identical to \(\mu\).
- It is a random variable with expected value \(\mu\) and standard error \(\sigma/\sqrt{N}\).
- It is a random variable with expected value \(\mu\) and standard error \(\sigma\).
- Contains no information.
Answer b, as stated in chapter 16.2 “Data-driven models”1:
“For a large enough sample size \(N\), the probability distribution of the sample average \(\bar{X}\) is approximately normal with expected value \(\mu\) and standard error \(\sigma/\sqrt{N}\).”
It means that the error of sample means will decrease by \(1/\sqrt{N}\) if the number of samples increases, this is the standard error of means. Variation is defined as \(V(X)=\sigma^2\), then the variation of the mean of \(\bar{X}\) will be \(V(\bar{X})=\sigma^2/N=\sigma/\sqrt{N}\).
4. So how is this useful? We are going to use an oversimplified yet illustrative example. Suppose we want to know the average height of our male students, but we only get to measure 50 of the 708. We will use \(\bar{X}\) as our estimate. We know from the answer to exercise 3 that the standard estimate of our error \(\bar{X}-\mu\) is \(\sigma/\sqrt{N}\). We want to compute this, but we don’t know \(\sigma\). Based on what is described in this section, show your estimate of \(\sigma\).
In this section 16.2 “Data-driven models”2 it is described that:
“A problem is that we don’t know \(\sigma\). But theory tells us that we can estimate the urn model \(\sigma\) with the sample standard deviation defined as \(s = \sqrt{ \sum_{i=1}^N (X_i - \bar{X})^2 / (N-1)}\).”
library(tidyverse)
library(dslabs)
data(heights)
set.seed(1)
x <- heights %>% filter(sex == "Male") %>% pull(height)
mu<-mean(x)
N<-50
x_N<-sample(x,size=N,replace=TRUE)
mu_N<-mean(x_N)
sqrt(sum((x_N-mu_N)^2)/(N-1))
## [1] 3.426742
sd(x_N)
## [1] 3.426742
se_N<-sd(x_N)/sqrt(N)
se_N
## [1] 0.4846145
5. Now that we have an estimate of \(\sigma\), let’s call our estimate \(s\). Construct a 95% confidence interval for \(\mu\).
library(tidyverse)
library(dslabs)
data(heights)
set.seed(1)
x <- heights %>% filter(sex == "Male") %>% pull(height)
mu<-mean(x)
N<-50
x_N<-sample(x,size=N,replace=TRUE)
mu_N<-mean(x_N)
se_N<-sd(x_N)/sqrt(N)
lower_N<-mu_N-1.96*se_N
upper_N<-mu_N+1.96*se_N
data.frame(names=c("mu","mu_N","se_N","lower_N","upper_N"),values=c(mu,mu_N,se_N,lower_N,upper_N))
6. Now run a Monte Carlo simulation in which you compute 10,000 confidence intervals as you have just done. What proportion of these intervals include \(\mu\)?
Hopefully 95%.
library(tidyverse)
library(dslabs)
data(heights)
set.seed(1)
x <- heights %>% filter(sex == "Male") %>% pull(height)
mu<-mean(x)
N<-50
B<-10000
take_sample <- function(N,x) {
x_N<-sample(x,size=N,replace=TRUE)
mu_N<-mean(x_N)
se_N<-sd(x_N)/sqrt(N)
c(mu_N-1.96*se_N,mu_N+1.96*se_N)
}
#take_sample(N,x)
interval <- replicate(B, {
take_sample(N,x)
})
df<-data.frame(t(interval),mu=replicate(B,mu))
df<-df %>% mutate(hit=ifelse(mu >= X1 & mu <=X2,1,0))
head(df)
mean(df$hit)
## [1] 0.948
7. In this section, we talked about pollster bias. We used visualization to motivate the presence of such bias. Here we will give it a more rigorous treatment. We will consider two pollsters that conducted daily polls. We will look at national polls for the month before the election.
data(polls_us_election_2016) polls <- polls_us_election_2016 %>% filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research", "The Times-Picayune/Lucid") & enddate >= "2016-10-15" & state == "U.S.") %>% mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)We want to answer the question: is there a poll bias? Make a plot showing the spreads for each poll.
The term “poll/pollster bias” is introduced in the chapter 16.1.2 “Pollster bias”3: “However, there appears to be differences across the polls. Note, for example, how the USC Dornsife/LA Times pollster is predicting a 4% win for Trump, while Ipsos is predicting a win larger than 5% for Clinton. The theory we learned says nothing about different pollsters producing polls with different expected values. All the polls should have the same expected value.”
Therefore, by comparing the expectation values of polls from different pollsters across all their polls an unexpected deviation is found.
The term itself was introduced by FiveThirtyEight. In a blog post they distinguish their usage of the two terms by stating: “Put another way: house effects are what we look at before the election; bias is what we look at after the election.”4.
However, in this book it the term “pollster bias” is defined as the effect that the polls of two pollsters have different expected values.
The computation shows an expectation value for the pollster “[1] Rasmussen Reports/Pulse Opinion Research” of \(E[1]=0.000625\), where the expectation value for “[2] The Times-Picayune/Lucid” is significantly larger with a value of \(E[2]=0.052916\).
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research","The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
polls %>% ggplot(aes(pollster, spread)) +
geom_point() +
coord_flip() +
geom_boxplot() +
scale_y_continuous(breaks = c(-0.02,0,0.02,0.04,0.06,0.08,0.10,0.12))
d_hat <- polls %>%
summarize(d_hat = sum(spread * samplesize) / sum(samplesize)) %>%
pull(d_hat)
p_hat <- (d_hat+1)/2
polls_sum<-polls %>% group_by(pollster) %>% summarize(ev=mean(spread), se = 2 * sqrt(p_hat * (1-p_hat) / median(samplesize)))
polls_sum<- polls_sum %>% mutate(diff_se=se[1]-se[2], diff_ev=ev[2]-ev[1])
polls_sum
#polls %>% mutate(d_hat = sum(spread * samplesize) / sum(samplesize)) %>% select(pollster,samplesize,spread,d_hat) %>% arrange(pollster)
#polls %>%
# ggplot(aes(spread)) +
# geom_histogram(color="black", binwidth = .02)
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(state == "U.S." & enddate >= "2016-10-31" &
(grade %in% c("A+","A","A-","B+") | is.na(grade)))
polls <- polls %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
d_hat <- polls %>%
summarize(d_hat = sum(spread * samplesize) / sum(samplesize)) %>%
pull(d_hat)
p_hat <- (d_hat+1)/2
d_hat
## [1] 0.01426264
p_hat
## [1] 0.5071313
moe <- 1.96 * 2 * sqrt(p_hat * (1 - p_hat) / sum(polls$samplesize))
polls %>% select(pollster,samplesize,spread) %>% arrange(pollster)
polls %>%
ggplot(aes(spread)) +
geom_histogram(color="black", binwidth = .01)
polls %>% ggplot(aes(pollster, spread)) +
geom_point() +
coord_flip() +
#geom_boxplot() +
scale_y_continuous(breaks = c(-0.02,0,0.02,0.04,0.06,0.08,0.10,0.12))
Is the found bias large or not?
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(state == "U.S." & enddate >= "2016-10-31" ) %>% mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
#polls %>% filter(pollster == "USC Dornsife/LA Times") %>% select(pollster,samplesize,spread)
polls_max<-polls %>% group_by(pollster) %>% summarise(startdate=max(startdate),enddate=max(enddate),spread=max(spread),samplesize=max(samplesize))
polls_max
data(polls_us_election_2016)
polls<-polls_us_election_2016 %>%
filter(enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
polls %>% ggplot(aes(pollster, spread)) +
geom_point() +
coord_flip() +
#geom_boxplot() +
scale_y_continuous(breaks = c(-0.02,0,0.02,0.04,0.06,0.08,0.10,0.12))
d_hat <- polls %>%
summarize(d_hat = sum(spread * samplesize) / sum(samplesize)) %>%
pull(d_hat)
p_hat <- (d_hat+1)/2
polls_sum<-polls %>% group_by(pollster) %>%
filter(n() >= 6) %>%
summarize(ev=mean(spread), se = 2 * sqrt(p_hat * (1-p_hat) / median(samplesize))) %>% arrange(se)
polls_sum
polls_mx<-union(polls_sum %>% head(1),polls_sum %>% tail(1))
polls_mx<- polls_mx %>% mutate(diff_se=se[2]-se[1], diff_ev=ev[2]-ev[1])
polls_mx
polls<-polls %>% filter(pollster %in% polls_mx$pollster)
polls %>% ggplot(aes(pollster, spread)) +
geom_point() +
coord_flip() +
#geom_boxplot() +
scale_y_continuous(breaks = c(-0.06,-0.04,-0.02,0,0.02,0.04,0.06,0.08,0.10,0.12))
8. The data does seem to suggest there is a difference. However, these data are subject to variability. Perhaps the differences we observe are due to chance.
The urn model theory says nothing about pollster effect. Under the urn model, both pollsters have the same expected value: the election day difference, that we call \(d\).
To answer the question “is there an urn model?”, we will model the observed data \(Y_{i,j}\) in the following way:
\[ Y_{i,j} = d + b_i + \varepsilon_{i,j} \]
with \(i=1,2\) indexing the two pollsters, \(b_i\) the bias for pollster \(i\) and \(\varepsilon_{i,j}\) poll to poll chance variability. We assume the \(\varepsilon\) are independent from each other, have expected value \(0\) and standard deviation \(\sigma_i\) regardless of \(j\).
Which of the following best represents our question?
- Is \(\varepsilon_{i,j}\) = 0?
- How close are the \(Y_{i,j}\) to \(d\)?
- Is \(b_1 \neq b_2\)?
- Are \(b_1 = 0\) and \(b_2 = 0\) ?
Answer c, because \(b_i\) defines the bias for pollster \(i\) towards a candidate of a poll compared to the real expectation value.
If the bias were zero, it means that the poll predicted the outcome of an election spot on. In other words, it predicted the real expectation value of the population.
The variable \(\varepsilon_{i,j}\) defines how large the difference in the pollster bias is. This is referred to “house effect”.
The confusing part of this question is that the chapter in the book explains only one effect, the bias of two pollsters (\(\varepsilon_{i,j}\)) but not the bias of a single pollster towards a certain result (\(b_i\)).
In the first sentence of the question the null hypothesis is stated as \(H_0\): “The observed difference of the polls, the pollster bias, is due to chance.”.
Therefore, there are two different sets of samples \(Y_1\) and \(Y_2\) then the null hypothesis is that the different means \(b_1\) and \(b_2\) is due to chance. Or in other word how large is the probability that the difference is due to chance.
The objective of the null hypothesis is to assume that the observed effect is random. This would mean that if the experiment is repeated the observed effect would disappear. Therefore, the question is, how reliable is the shown effect, will it persist, if the experiment will be repeated.
Instead, the alternative hypothesis states that the observed effect is driven by a force from within the sampled system. For this reason, the intrinsic motivation of the researcher is to prove that the null hypothesis can be rejected in order to demonstrated that the observed effect is persistent and caused by a force within the investigated system.
This is the difference between descriptive statistics and inferential statistics. The former describes the actual sample where the latter one deduces information of future samples and about the total population size, from which the sample was taken.
9. In the right side of this model only \(\varepsilon_{i,j}\) is a random variable. The other two are constants. What is the expected value of \(Y_{1,j}\)?
To solve this question two assumptions are used:
First: The elements \(d\) and \(b_i\) are constants and therefore do not depend on the expectation value.
Second: As stated in the previous question: We assume the \(\varepsilon\) are independent from each other, have expected value \(0\) and standard deviation \(\sigma_i\) regardless of \(j\).".
\[ E[Y_{1,j}] = \frac{1}{2}( E[Y_{1,1}] + E[Y_{1,2}] ) = \frac{1}{2}( d + b_1 + E[\varepsilon_{1,1}] +d + b_1 + E[\varepsilon_{1,2}]) = d + b_1 + \frac{1}{2}(E[\varepsilon_{1,1}] + E[\varepsilon_{1,2}] ) \] From this assumption it follows that \(E[\varepsilon_{1,1}] = E[\varepsilon_{1,2}] = 0\). Therefore, the expectation value is \[ E[Y_{1,j}] = d + b_1. \]
10. Suppose we define \(\bar{Y}_1\) as the average of poll results from the first poll, \(Y_{1,1},\dots,Y_{1,N_1}\) with \(N_1\) the number of polls conducted by the first pollster:
polls %>% filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>% summarize(N_1 = n())What is the expected values \(\bar{Y}_1\)?
From the previous question it follows that:
\[ E[Y_{1,j}] \rightarrow E[\bar{Y}_1] = d + b_1 \]
library(dslabs)
options(scipen = 10)
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research",
"The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
n<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(N_1 = n()) %>% pull()
mean<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = mean(spread)) %>% pull()
data.frame(names=c("n","mean"),values=c(n,mean))
11. What is the standard error of \(\bar{Y}_1\) ?
From question 3 and 4 one can conclude that using the approximation of the variance \(V(\bar{Y}_1) = \sigma^2\) for the standard error by dividing with the square root of the number of polls.
\[ SE[\bar{Y}_1] = \sigma_1/\sqrt{N}_1 \]
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research",
"The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
n<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(N_1 = n()) %>% pull()
sdev<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = sd(spread)) %>% pull()
sigma<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = sd(spread)/sqrt(n)) %>% pull()
data.frame(names=c("n","sdev","sigma"),values=c(n,sdev,sigma))
12. Suppose we define \(\bar{Y}_2\) as the average of poll results from the first poll, \(Y_{2,1},\dots,Y_{2,N_2}\) with \(N_2\) the number of polls conducted by the first pollster. What is the expected value \(\bar{Y}_2\)?
Equivalent to question 10.
\[ E[\bar{Y}_2] = d + b_2 \]
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research",
"The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
n<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(N_1 = n()) %>% pull()
mean<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_1 = mean(spread)) %>% pull()
data.frame(names=c("n","mean"),values=c(n,mean))
13. What is the standard error of \(\bar{Y}_2\) ?
Equivalent to question 11.
\[ SE[\bar{Y}_1] = \sigma_2/\sqrt{N}_2 \]
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research",
"The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
n<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(N_1 = n()) %>% pull()
sdev<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_1 = sd(spread)) %>% pull()
sigma<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_1 = sd(spread)/sqrt(n)) %>% pull()
data.frame(names=c("n","sdev","sigma"),values=c(n,sdev,sigma))
14. Using what we learned by answering the questions above, what is the expected value of \(\bar{Y}_{2} - \bar{Y}_1\)?
The property of linear transformation of the expectation value is used here \(E[\bar{Y}_{2} - \bar{Y}_1] = E[\bar{Y}_{2}] - E[\bar{Y}_1]\).
\[ E[\bar{Y}_{2} - \bar{Y}_1] = (d+b_2)-(d+b_1) = b_2-b_1 \]
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research",
"The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
meanY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = mean(spread)) %>% pull()
meanY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_2 = mean(spread))%>% pull()
data.frame(names=c("E[Y_1]","E[Y_2]","E[Y_2-Y_1]"),values=c(meanY1,meanY2,meanY2-meanY1))
15. Using what we learned by answering the questions above, what is the standard error of \(\bar{Y}_{2} - \bar{Y}_1\)?
Opposite to the expectation value the standard error does not obey linear transformations. The following auxiliary equation will be used, first \(\mbox{SD}[\bar{X}] = +\sqrt{\mbox{Var}(\bar{X})}\) and second \(\mbox{Var}(\bar{X}-\bar{Y}) = \mbox{Var}(\bar{X})+\mbox{Var}(\bar{Y})\), which follows from the variance of the sum of independent random variables It also holds true that \(\mbox{Var}(\bar{X}) = \sqrt{\mbox{SE}[\bar{X}]}= \sigma/\sqrt{N}\). In addition, it is used that \(\mbox{SE}[\bar{X}] = \mbox{SD}[\bar{X}]/\sqrt{N}\) the standard error can be approximated by the standard deviation.
Putting it all together, the previous formulas can be written as \(\mbox{Var}(\bar{X})=\sigma^2\) for variance, \(\mbox{SD}[\bar{X}]=\sigma\) for standard deviation and \(\mbox{SE}[\bar{X}]=\sigma/\sqrt{N}\) for standard error.
\[ SE[\bar{Y}_{2} - \bar{Y}_1] = \sqrt{SE[\bar{Y}_2]^2+SE[\bar{Y}_1]^2} = \sqrt{\frac{\sigma_2^2}{N_2}+\frac{\sigma_1^2}{N_1}} \]
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research",
"The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
nY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(N_1 = n()) %>% pull()
sigmaY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = sd(spread)) %>% pull()
nY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(N_1 = n()) %>% pull()
sigmaY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_1 = sd(spread)) %>% pull()
SE<-sqrt(sigmaY1^2/nY1+sigmaY2^2/nY2)
data.frame(values=c("nY1","sigmaY1","nY2","sigmaY2","SE"), values=c(nY1,sigmaY1,nY2,sigmaY2,SE))
16. The answer to the question above depends on \(\sigma_1\) and \(\sigma_2\), which we don’t know. We learned that we can estimate these with the sample standard deviation. Write code that computes these two estimates.
In chapter 16.2 “Data-driven models”5 it is explained that: “A problem is that we don’t know \(\sigma\). But theory tells us that we can estimate the urn model \(\sigma\) with the sample standard deviation defined as \(s = \sqrt{ \sum_{i=1}^N (X_i - \bar{X})^2 / (N-1)}\).”. Here, the sample standard deviation value \(s\) can be computed with the function sd().
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research","The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
polls <- polls %>% select(pollster, samplesize,spread)
polls %>% group_by(pollster) %>% summarise(sigma=sd(spread))
17. What does the CLT tell us about the distribution of \(\bar{Y}_2 - \bar{Y}_1\)?
- Nothing because this is not the average of a sample.
- Because the \(Y_{ij}\) are approximately normal, so are the averages.
- Note that \(\bar{Y}_2\) and \(\bar{Y}_1\) are sample averages, so if we assume \(N_2\) and \(N_1\) are large enough, each is approximately normal. The difference of normals is also normal.
- The data are not 0 or 1, so CLT does not apply.
Answer c.
For the first assumption, that the averages are normally distributed if the sample \(N_m\) is large enough the following holds true: chapter 15.4 “Central Limit Theorem in practice”: The CLT tells us that the distribution function for a sum of draws is approximately normal. We also learned that dividing a normally distributed random variable by a constant is also a normally distributed variable. This implies that the distribution of \(\bar{X}\) is approximately normal."6
From this it can be surely followed that the CLT obeys linear transformation rules. Therefore, the second statement for the difference can be assumed to be true as well.
18. We have constructed a random variable that has expected value \(b_2 - b_1\), the pollster bias difference. If our model holds, then this random variable has an approximately normal distribution and we know its standard error. The standard error depends on \(\sigma_1\) and \(\sigma_2\), but we can plug the sample standard deviations we computed above. We started off by asking: is \(b_2 - b_1\) different from 0? Use all the information we have learned above to construct a 95% confidence interval for the difference \(b_2\) and \(b_1\).
Recall the necessary equations: the expectation value for the difference of the pollster bias \(E[\bar{Y}_{2} - \bar{Y}_1] = b_2-b_1\) from question 14 and the corresponding standard error \(SE[\bar{Y}_{2} - \bar{Y}_1] = \sqrt{\frac{\sigma_2^2}{N_2}+\frac{\sigma_1^2}{N_1}}\) from question 15.
Furthermore, the equation for confidence interval is \(Pr(E[\bar{Y}_{2} - \bar{Y}_1]-1,96*SE[\bar{Y}_{2} - \bar{Y}_1] \leq p \leq E[\bar{Y}_{2} - \bar{Y}_1]+1,96*SE[\bar{Y}_{2} - \bar{Y}_1]) = 0,95\) The question “[…] is \(b_2-b_1\) different from 0? […]” refers to exercise 8, where it is asked if \(b_2 \neq b_1\) for the urn model to hold true.
In chapter 16.1.2 “Pollster bias”7 it is stated that:
“The theory we learned says nothing about different pollsters producing polls with different expected values. All the polls should have the same expected value.”
Therefore, the value from the difference of the expected values \(E[\bar{Y_2}-\bar{Y_1}] = b_2-b_1=0.05 = 5\)% is the amount of the pollster effect considering the two pollsters in this example.
The confidence interval (CI) of 0.0385 to 0.0660 has a narrow mean of error of 0.02756 (\(=\mbox{CI}_2-\mbox{CI}_1\)). The confidence interval does not contain the null value of 0, therefore the null hypothesis can be rejected. It follows, that the pollster bias cannot be explained just due to randomness.
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research",
"The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
meanY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = mean(spread)) %>% pull()
meanY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_2 = mean(spread))%>% pull()
nY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(N_1 = n()) %>% pull()
sigmaY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = sd(spread)) %>% pull()
nY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(N_1 = n()) %>% pull()
sigmaY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_1 = sd(spread)) %>% pull()
SE<-sqrt(sigmaY1^2/nY1+sigmaY2^2/nY2)
EY2Y1<-meanY2-meanY1
ci1<-EY2Y1-1.96*SE
ci2<-EY2Y1+1.96*SE
data.frame(names=c("CI2-CI1","CI1","CI2","EY2Y1","SE"), values=c(ci2-ci1,ci1,ci2,EY2Y1,SE))
19. The confidence interval tells us there is relatively strong pollster effect resulting in a difference of about 5%. Random variability does not seem to explain it. We can compute a p-value to relay the fact that chance does not explain it. What is the p-value?
The p-value can be computed first by calculating the ratio by dividing the expected value \(E[\bar{Y}_{2} - \bar{Y}_1]\) of by its standard error \(SE[\bar{Y}_{2} - \bar{Y}_1]\) to yield the ratio: \[
z=\frac{E[\bar{Y}_{2} - \bar{Y}_1]}{SE[\bar{Y}_{2} - \bar{Y}_1]}=\frac{b_2-b_1}{\sqrt{s_2^2/N_2+s_1^2/N_1}}
\] Since the actual \(\sigma\) is not known it was approximated by the sample standard deviation \(s\) Then the ratio can be used with the pnorm()function to compute the probability of the p-value.
In chapter 15.98 “p-values” the concept was introduced by computing the ratio of \(z=\frac{\mid\bar{X}-E[\bar{X}]\mid}{SE[\bar{X}]}=\frac{\mid \bar{X} - p\mid}{\sqrt{p(1-p)/N}}\). Then the ratio \(z\) was plugged into the standard normal distribution function 1 - (pnorm(z) - pnorm(-z)) in order to gain the actual p-value
This method of computing the p-value has been also applied also in chapter 15.10.5 “Confidence intervals for the odds ratio”9, in which the confidence interval was computed along with the p-value using the log ratio of a 2x2 contingency table \(z=\frac{a/c}{b/d}\) and the function 2*(1 - pnorm(z)).
| Parameter 1 | Parameter 2 | |
|---|---|---|
| Obervation 1 | a | b |
| Observation 2 | c | d |
Also, in the exercise chapter 15.1110 in the first question the p-value from the chi-square statistics is computed utilizing the ratio of a 2x2 matrix. The matrix is the parameter for the chi-square function \(\chi^2 = \sum_i^N \frac{(O_i-E_i)^2}{E_i}\), where \(O_i\) is the observed and \(E_i\) is the expected value. The result of the chi-square function is the actual chi-square value \(z=\chi^2\). Then \(z\) can be used to compute the p-value from the chi-square probability distribution function pchisq(z).
The information one gets of looking at any distribution is to gain insight of the whole population. Therefore, it is crucial to be sure which distribution a population obeys in order to transfer the sample results to the whole population.
So the sample that is being investigated is only viable if it can be extrapolated on the total population.
library(dslabs)
library(tidyverse)
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(pollster %in% c("Rasmussen Reports/Pulse Opinion Research",
"The Times-Picayune/Lucid") &
enddate >= "2016-10-15" &
state == "U.S.") %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
meanY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = mean(spread)) %>% pull()
meanY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_2 = mean(spread))%>% pull()
nY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(N_1 = n()) %>% pull()
sigmaY1<-polls %>%
filter(pollster=="Rasmussen Reports/Pulse Opinion Research") %>%
summarize(Y_1 = sd(spread)) %>% pull()
nY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(N_1 = n()) %>% pull()
sigmaY2<-polls %>%
filter(pollster=="The Times-Picayune/Lucid") %>%
summarize(Y_1 = sd(spread)) %>% pull()
SE<-sqrt(sigmaY1^2/nY1+sigmaY2^2/nY2)
EY2Y1<-meanY2-meanY1
ci1<-EY2Y1-1.96*SE
ci2<-EY2Y1+1.96*SE
z<- EY2Y1/SE
p_val<-1-(pnorm(z)-pnorm(-z))
data.frame(names=c("p_val","z","CI1","CI2","MOE","EY1","EY2","EY2Y1","SE"), values=c(p_val,z,ci1,ci2,ci2-ci1,meanY1,meanY2,meanY2-meanY1,SE))
The computation yields a ratio of \(z=7.44\) and results in a rather small p-value of \(1.03\ 10^{-13}\), which is smaller than the threshold of 0.05 so that the null hypothesis is rejected. It follows, that the pollster effect cannot be explained by chance only. Therefore, there is a systematic force, which drives this effect that is not caused by randomness.
20. The statistic formed by dividing our estimate of \(b_2-b_1\) by its estimated standard error:
\[ \frac{\bar{Y}_2 - \bar{Y}_1}{\sqrt{s_2^2/N_2 + s_1^2/N_1}} \]
is called the t-statistic. Now notice that we have more than two pollsters. We can also test for pollster effect using all pollsters, not just two. The idea is to compare the variability across polls to variability within polls. We can actually construct statistics to test for effects and approximate their distribution. The area of statistics that does this is called Analysis of Variance or ANOVA. We do not cover it here, but ANOVA provides a very useful set of tools to answer questions such as: is there a pollster effect?
For this exercise, create a new table:
polls <- polls_us_election_2016 %>% filter(enddate >= "2016-10-15" & state == "U.S.") %>% group_by(pollster) %>% filter(n() >= 5) %>% mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100) %>% ungroup()Compute the average and standard deviation for each pollster and examine the variability across the averages and how it compares to the variability within the pollsters, summarized by the standard deviation.
\(t = \frac{\mbox{E}[\bar{X}-\bar{Y}]}{\mbox{SE}[\bar{X}-\bar{Y}]} =\frac{\mbox{average}}{\mbox{standard deviation}} = \frac{\mbox{variability across the pollsters}}{\mbox{variability within the pollsters}}\)
A large t-value would result from two different groups, where a small t-value from similar groups. The distinction between large and small t-values, i.e. the significance of the t-value, can be deduced from the p-value of the t-distribution. A p-value lower than 5% tells that the observed difference is just due to randomness.
Discussion:
library(dslabs)
library(tidyverse)
options(scipen = 50)
polls <- polls_us_election_2016 %>%
filter(enddate >= "2016-10-15" &
state == "U.S.") %>%
group_by(pollster) %>%
filter(n() >= 5) %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
#%>% ungroup()
polls_grp <- polls %>% select(pollster,samplesize,spread) %>% group_by(pollster) %>% summarize(EY=mean(spread),sigmaY=sd(spread),nY = n(),sigsqN=sigmaY^2/nY) %>% arrange(sigsqN)
polls_grp
Crosstable data
polls_t<-polls_grp %>% select(pollster,sigsqN,EY)
x<-data.frame(a=polls_t)
y<-data.frame(b=polls_t)
polls_merge <-merge(x,y,all=TRUE) %>% mutate(SE=sqrt(a.sigsqN+b.sigsqN),EXEY=b.EY-a.EY,CI1=EXEY-1.96*SE,CI2=EXEY+1.96*SE, z=EXEY/SE,tstat=2*(1-pnorm(z))) %>% select(a.pollster,b.pollster, SE,EXEY,CI1,CI2,z,tstat) %>% arrange(abs(z))
polls_merge<-polls_merge %>% filter(a.pollster!=b.pollster) %>% filter(z>0) #%>% filter(tstat<0.05)
polls_merge<-polls_merge %>% mutate(c.pollster=paste(a.pollster,"_",b.pollster)) %>% mutate(significant=as.factor(ifelse(tstat<0.05,"yes","no"))) %>% mutate(no_null_value=as.factor(ifelse(CI1>0,"yes","no")))
polls_merge
Compare result of confidence interval with p-value
polls_merge %>% select(c.pollster,significant,no_null_value) %>% mutate(diff=ifelse(significant!=no_null_value,1,0)) %>% summarise(d=sum(diff))
p-value for each pollster combination :
ggplot(polls_merge,aes(a.pollster, tstat, color=b.pollster))+
geom_point() +
coord_flip()
ggplot(polls_merge,aes(b.pollster, tstat, color=a.pollster))+
geom_point() +
coord_flip()
another plot for the p-value of a pollster-pollster combinaton :
ggplot(polls_merge,aes(c.pollster, tstat, color=significant))+
geom_point() +
coord_flip()
Third plot of p-value on a grid:
ggplot(polls_merge,aes(a.pollster, tstat, color=significant))+
geom_point() +
coord_flip() +
facet_wrap(~b.pollster, ncol=2) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
confidence interval:
ggplot(polls_merge,aes(a.pollster, EXEY, color=no_null_value))+
geom_point() +
coord_flip() +
geom_errorbar(aes(ymin=CI1, ymax=CI2)) +
facet_wrap(~b.pollster, ncol=2) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
scatter plots of available numerical variables, to discover unforeseen relationships.
polls_merge %>% select(-a.pollster, -b.pollster, -c.pollster, -significant, -no_null_value) %>% plot()
1. In 1999, in England, Sally Clark11 was found guilty of the murder of two of her sons. Both infants were found dead in the morning, one in 1996 and another in 1998. In both cases, she claimed the cause of death was sudden infant death syndrome (SIDS). No evidence of physical harm was found on the two infants so the main piece of evidence against her was the testimony of Professor Sir Roy Meadow, who testified that the chances of two infants dying of SIDS was 1 in 73 million. He arrived at this figure by finding that the rate of SIDS was 1 in 8,500 and then calculating that the chance of two SIDS cases was 8,500 \(\times\) 8,500 \(\approx\) 73 million. Which of the following do you agree with?
- Sir Meadow assumed that the probability of the second son being affected by SIDS was independent of the first son being affected, thereby ignoring possible genetic causes. If genetics plays a role then: \(\mbox{Pr}(\mbox{second case of SIDS} \mid \mbox{first case of SIDS}) < \mbox{Pr}(\mbox{first case of SIDS})\).
- Nothing. The multiplication rule always applies in this way: \(\mbox{Pr}(A \mbox{ and } B) =\mbox{Pr}(A)\mbox{Pr}(B)\)
- Sir Meadow is an expert and we should trust his calculations.
- Numbers don’t lie.
Answer a.
Ignoring possible correlation between two possibly dependent events might lead to false conclusions.
2. Let’s assume that there is in fact a genetic component to SIDS and the probability of \(\mbox{Pr}(\mbox{second case of SIDS} \mid \mbox{first case of SIDS}) = 1/100\), is much higher than 1 in 8,500. What is the probability of both of her sons dying of SIDS?
The probability of two children being siblings dying of SIDS is 1 in 850.000 or 1.18e-06. \[ \mbox{Pr}(\mbox{second case of SIDS}) = \mbox{Pr}(\mbox{second case of SIDS} \mid \mbox{first case of SIDS}) * \mbox{Pr}(\mbox{first case of SIDS}) = 1/100 \cdot 1/8500 = 1/850000 \]
In the following the numerical calculation is shown.
options(scipen = 1)
PrA <- 1/8500
PrBA <- 1/100
PrB <- PrA * PrBA
PrB
## [1] 1.176471e-06
3. Many press reports stated that the expert claimed the probability of Sally Clark being innocent as 1 in 73 million. Perhaps the jury and judge also interpreted the testimony this way. This probability can be written as the probability of a mother is a son-murdering psychopath given that two of her children are found dead with no evidence of physical harm. According to Bayes’ rule, what is this?
So the question states the hypothesis, that the jury were interpreting the probability “1 in 73 million” as the chances of Sally Clark being innocent, not as the probability of two infants dying from SIDS. Therefore, the misconception was that the probability of Sally Clark being innocent is 1/73 million.
How can this hypothesis be modeled?
Let be the hypothesis \(H\), that the mother is a the murderer: “mother murderer” = “a mother is a son-murdering psychopath”. Let the event \(E\) be the two dead children “dead children” = “two of her children are found dead with no evidence of physical harm”. Then one can write the the Bayes equation as follows.
\[ \mbox{Pr}(\mbox{mother murderer} \mid \mbox{dead children}) = \mbox{Pr}(\mbox{dead children} \mid \mbox{mother murderer}) \cdot \frac{\mbox{Pr}(\mbox{mother murderer})}{\mbox{Pr}(\mbox{dead children})} \] The correlation of the two events is modeled by the elaborated version of Bayes Theorem, where the expression \(not H\) denotes the complement of \(H\), in other word \(H\) is the true hypothesis and \(\mbox{not } H\) is the false hypothesis. The expression \(\mbox{Pr}(H \mid E)\) is read as what is the probability of a hypothesis \(H\) being true, if \(E\) has already occurred, where \(E\) denotes the evidence, data or the facts.
The probabilities are called \(\mbox{Pr}(E)\) = Evidence: ‘the probability that the data being measured’, then \(\mbox{Pr}(H)\) = Prior: ‘the probability of the hypothesis being true’ (not considering the data), then \(\mbox{Pr}(H \mid E)\) = Posterior: ‘the probability of the hypothesis being true, now given the data or based on the data’, lastly \(\mbox{Pr}(E \mid H)\) = Likelihood: ‘the probability of the data or event occurring, based on the hypothesis being true’.
For example, the expression \(\mbox{Pr}(\mbox{mother murderer} \mid \mbox{dead children})\) is read, what is the probability that a mother is the murderer of her children, given that two children found dead with no harm. \[
\mbox{Pr}(H \mid E) = \mbox{Pr}(E \mid H) \cdot \frac{\mbox{Pr}(H)}{\mbox{Pr}(E)} = \mbox{Pr}(E \mid H) \cdot \frac{\mbox{Pr}(H)}{\mbox{Pr}(E \mid H) \cdot\mbox{Pr}(H) + \mbox{Pr}(E \mid \mbox{not }H) \cdot \mbox{Pr}(\mbox{not }H)}
\]
4. Assume that the chance of a son-murdering psychopath finding a way to kill her children, without leaving evidence of physical harm, is:
\[ \mbox{Pr}(A \mid B) = 0.50 \]
with A = two of her children are found dead with no evidence of physical harm and B = a mother is a son-murdering psychopath = 0.50. Assume that the rate of son-murdering psychopaths mothers is 1 in 1,000,000. According to Bayes’ theorem, what is the probability of \(\mbox{Pr}(B \mid A)\) ?
First, define the necessary terms for the equation:
1. \(\mbox{Pr}(\mbox{mother murderer}) = 1/1000000\),
2. \(\mbox{Pr}(\mbox{dead children} \mid \mbox{mother murderer}) = 1/2\) and
3. \(\mbox{Pr}(\mbox{dead children})= 1/8500 \cdot 1/100= 1/850000\).
\[ \mbox{Pr}(\mbox{mother murderer} \mid \mbox{dead children}) = 1/2 \cdot \frac{1/1000000}{1/850000} = 1/2 \cdot \frac{85}{100} = 1/2 \cdot \frac{17}{20} = 0.5\cdot 0.85 = 0.425 \] The results tells us that the probability of two of her children are found dead with no evidence of physical harm given that the mother being a mother is a son-murdering psychopath results in 0.425. This finding is in strong contradiction to the probability of 1 to 73 million.
- After Sally Clark was found guilty, the Royal Statistical Society issued a statement saying that there was “no statistical basis” for the expert’s claim. They expressed concern at the “misuse of statistics in the courts”. Eventually, Sally Clark was acquitted in June 2003. What did the expert miss?
- He made an arithmetic error.
- He made two mistakes. First, he misused the multiplication rule and did not take into account how rare it is for a mother to murder her children. After using Bayes’ rule, we found a probability closer to 0.5 than 1 in 73 million.
- He mixed up the numerator and denominator of Bayes’ rule.
- He did not use R.
Answer b.
The answer to this question is the combination of the results of questions 1 and 4.
6. Florida is one of the most closely watched states in the U.S. election because it has many electoral votes, and the election is generally close, and Florida tends to be a swing state that can vote either way. Create the following table with the polls taken during the last two weeks:
library(tidyverse) library(dslabs) data(polls_us_election_2016) polls <- polls_us_election_2016 %>% filter(state == "Florida" & enddate >= "2016-11-04" ) %>% mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)Take the average spread of these polls. The CLT tells us this average is approximately normal. Calculate an average and provide an estimate of the standard error. Save your results in an object called
results.
library(tidyverse)
library(dslabs)
data(polls_us_election_2016)
polls <- polls_us_election_2016 %>%
filter(state == "Florida" & enddate >= "2016-11-04" ) %>%
mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)
polls
7. Now assume a Bayesian model that sets the prior distribution for Florida’s election night spread \(d\) to be Normal with expected value \(\mu\) and standard deviation \(\tau\). What are the interpretations of \(\mu\) and \(\tau\)?
- \(\mu\) and \(\tau\) are arbitrary numbers that let us make probability statements about \(d\).
- \(\mu\) and \(\tau\) summarize what we would predict for Florida before seeing any polls. Based on past elections, we would set \(\mu\) close to 0 because both Republicans and Democrats have won, and \(\tau\) at about \(0.02\), because these elections tend to be close.
- \(\mu\) and \(\tau\) summarize what we want to be true. We therefore set \(\mu\) at \(0.10\) and \(\tau\) at \(0.01\).
- The choice of prior has no effect on Bayesian analysis.
8. The CLT tells us that our estimate of the spread \(\hat{d}\) has normal distribution with expected value \(d\) and standard deviation \(\sigma\) calculated in problem 6. Use the formulas we showed for the posterior distribution to calculate the expected value of the posterior distribution if we set \(\mu = 0\) and \(\tau = 0.01\).
9. Now compute the standard deviation of the posterior distribution.
10. Using the fact that the posterior distribution is normal, create an interval that has a 95% probability of occurring centered at the posterior expected value. Note that we call these credible intervals.
11. According to this analysis, what was the probability that Trump wins Florida?
12. Now use
sapplyfunction to change the prior variance fromseq(0.05, 0.05, len = 100)and observe how the probability changes by making a plot.
1. Create this table:
library(tidyverse) library(dslabs) data("polls_us_election_2016") polls <- polls_us_election_2016 %>% filter(state != "U.S." & enddate >= "2016-10-31") %>% mutate(spread = rawpoll_clinton/100 - rawpoll_trump/100)Now for each poll use the CLT to create a 95% confidence interval for the spread reported by each poll. Call the resulting object cis with columns lower and upper for the limits of the confidence intervals. Use the
selectfunction to keep the columnsstate, startdate, end date, pollster, grade, spread, lower, upper.
2. You can add the final result to the
cistable you just created using theright_joinfunction like this:add <- results_us_election_2016 %>% mutate(actual_spread = clinton/100 - trump/100) %>% select(state, actual_spread) cis <- cis %>% mutate(state = as.character(state)) %>% left_join(add, by = "state")Now determine how often the 95% confidence interval includes the actual result.
3. Repeat this, but show the proportion of hits for each pollster. Show only pollsters with more than 5 polls and order them from best to worst. Show the number of polls conducted by each pollster and the FiveThirtyEight grade of each pollster. Hint: use
n=n(), grade = grade[1]in the call to summarize.
4. Repeat exercise 3, but instead of pollster, stratify by state. Note that here we can’t show grades.
5. Make a barplot based on the result of exercise 4. Use
coord_flip.
6. Add two columns to the
cistable by computing, for each poll, the difference between the predicted spread and the actual spread, and define a columnhitthat is true if the signs are the same. Hint: use the functionsign. Call the objectresids.
7. Create a plot like in exercise 5, but for the proportion of times the sign of the spread agreed.
8. In exercise 7, we see that for most states the polls had it right 100% of the time. For only 9 states did the polls miss more than 25% of the time. In particular, notice that in Wisconsin every single poll got it wrong. In Pennsylvania and Michigan more than 90% of the polls had the signs wrong. Make a histogram of the errors. What is the median of these errors?
9. We see that at the state level, the median error was 3% in favor of Clinton. The distribution is not centered at 0, but at 0.03. This is the general bias we described in the section above. Create a boxplot to see if the bias was general to all states or it affected some states differently. Use
filter(grade %in% c("A+","A","A-","B+") | is.na(grade)))to only include pollsters with high grades.
10. Some of these states only have a few polls. Repeat exercise 9, but only include states with 5 good polls or more. Hint: use
group_by,filterthenungroup. You will see that the West (Washington, New Mexico, California) underestimated Hillary’s performance, while the Midwest (Michigan, Pennsylvania, Wisconsin, Ohio, Missouri) overestimated it. In our simulation, we did not model this behavior since we added general bias, rather than a regional bias. Note that some pollsters may now be modeling correlation between similar states and estimating this correlation from historical data. To learn more about this, you can learn about random effects and mixed models.
https://rafalab.github.io/dsbook/models.html#data-driven-model↩︎
https://rafalab.github.io/dsbook/models.html#data-driven-model↩︎
https://rafalab.github.io/dsbook/models.html#pollster-bias↩︎
Source: “https://fivethirtyeight.com/features/when-house-effects-become-bias/” 07.08.2921↩︎
https://rafalab.github.io/dsbook/models.html#data-driven-model↩︎
https://rafalab.github.io/dsbook/models.html#pollster-bias↩︎
https://rafalab.github.io/dsbook/inference.html#confidence-intervals-for-the-odds-ratio↩︎
https://rafalab.github.io/dsbook/inference.html#exercises-29↩︎